Cooperative Environmental Monitoring 
for PTZ Visual Sensor Networks: 
A Payoff-based Learning Approach 

Takeshi Hatanaka, Member, IEEE, Yasuaki Wasa and 
O . Masayuki Fujita, Member, IEEE 

00 , 

^ ■ Abstract 

, This paper investigates cooperative environmental monitoring for Pan-Tilt-Zoom (PTZ) visual sensor 

networks. We first present a novel formulation of the optimal environmental monitoring problem, whose 
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objective function is intertwined with the uncertain state of the environment. In addition, due to the large 
volume of vision data, it is desired for each sensor to execute processing through local computation and 
communication. To address the issues, we present a distributed solution to the problem based on game 
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■ theoretic cooperative control and payoff -based learning. At the first stage, a utility function is designed so 
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CO I function, where the designed utility is shown to be computable through local image processing and 



that the resulting game constitutes a potential game with potential function equal to the group objective 



communication. Then, we present a payoff-based learning algorithm so that the sensors are led to the 
global objective function maximizers without using any prior information on the environmental state. 
Finally, we run experiments to demonstrate the effectiveness of the present approach. 
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I. Introduction 

Large-scale environmental monitoring to reveal environmental states has become crucial due to 
recent serious natural disasters including earthquakes, tsunamis, nuclear meltdowns, landslides, 
typhoons/hurricanes and so on. In the task, it is in general required to collect dense data in real 
time over widespread environment. As a solution to the issue, sensor networks have emerged over 
the past few decades and been extensively studied. Moreover, mobile/robotic sensor networks 
have also been deeply investigated as a key technology to enhance data collection efficiency [[II. 

Among a variety of sensors available for the monitoring task [1], this paper focuses on vision 
sensors. In particular, we consider a sensor network consisting of spatially distributed cameras, 
which is called camera/visual sensor network [2J. Then, we need to take account of the following 
nature of vision sensors: (i) volume of data tends to be larger than the other sensors, (ii) vision 
sensors do not provide explicit physical data, and (iii) vision sensors are inherently heterogeneous, 
which means that, even if quality of two sensors are the same, quality of their measurements on 
a common point can differ in the location of the point relative to the camera frames. 

In this paper, we investigate a distributed/cooperative optimal monitoring strategy for a network 
of Pan-Tilt-Zoom (PTZ) cameras by controlling the camera parameters, which are called actions 
in this paper. Note that what is the optimal action in the problem is affected by the unknown 
environmental state. Accordingly, we have to solve the optimization problem under the restriction: 
(iv) each vision sensor has no access to the reward brought about by an action before the action is 
actually executed. Optimization under (iv) has been deeply studied in the field of reinforcement 
learning and simulated annealing. However, these algorithms are centralized and might not be 
available due to the nature (i). 

This paper first formulates a novel optimal environmental monitoring problem for PTZ visual 
sensor networks reflecting the nature (ii) and (iii), where we let the objective function rely on the 
amount of information contained in the sensed data and the quality of the measurement. Then, 
we next present a distributed solution to the problem leading the sensors to the globally optimal 
actions under the restriction of (iv). To meet the requirements, this paper employs techniques 
in game theoretic cooperative control originally presented in ^ since it provides a systematic 
design procedure of cooperative control for heterogeneous networks as stated in (iii). Following 
the procedure of [3], we first constitute a potential game [[3] with the potential function equal 
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to the global objective function through an appropriate utility design technique [4J. Then, as a 
technical tool to address (iv), we employ an action selection rule called payoff-based learning 
O-IIHl, where each player chooses his action based only on the past experienced payoffs. In 
particular, we present a novel payoff-based learning algorithm which guarantees convergence in 
probability to the potential function maximizers, which are equal to the global objective function 
maximizers. Finally, we run experiments on a visual sensor network testbed to demonstrate the 
effectiveness of the present approach. 

Related Works and Contributions 

Due to the nature of vision sensors (i), distributed processing over visual sensor networks has 
been actively studied in some recent papers. Cooperative estimation over visual sensor networks 
is studied in BH, iflOll . IITTI . distributed localization/calibration is investigated in [fTTI . [|T2l . and 
distributed sensing strategies are presented in [fT3l . [[T4l. [[TSl . In particular, the scenarios and 
approaches in [fT3l . [fT4l are closely related to this paper and hence they will be mentioned later. 

The objective of this paper is related to coverage control [ir6l -[|2T | whose objective is to 
deploy mobile sensors efficiently via distributed decision-making. A gradient decent approach 
widely used in the literature [[T6l . [fTTI is implementable even under the restriction (iv). However, 
the approach is not always directly applicable to the problem of this paper due to the nature of 
vision sensors (ii) and (iii). More importantly, the gradient decent approach leads sensors to a 
configuration achieving local maxima of some group objective function, but such a configuration 
does not always globally maximize the objective function. 

Persistent monitoring is also recently studied e.g. in [|22l - [|25l . which differs from coverage 
in the perpetual need to cover a changing environment [|24l. [|25l . However, to the best of 
our knowledge, there are few works fully taking account of the nature of vision sensors. In 
addition, while most of the works [|23l - [|25l assume information accumulation/decay models 
and availability of the model, this paper does not presume such models. 

The papers [|3l, [fT3l . [fT4l . [|2TI are most directly related to this paper, where the authors 
investigate potential game theoretic approaches to coverage control or collaborative sensing. The 
algorithm presented in [3] guarantees that players eventually take the globally optimal action with 
high probability. However, it presumes availability of future payoffs prior to action executions 
and hence cannot be implemented under (iv). [[T4| presents a payoff-based leaming algorithm 
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and applied it to coverage for visual (mobile) sensor networks. However, the algorithm does 
not always lead sensors to the globally optimal actions and the nature (ii) and (iii) are not fully 
addressed. Similar statements are also true for [21]. Meanwhile, |fT3l mentions how to use the 
visual measurement in the process explicitly. Though the authors utilize a learning algorithm 
assuming a future payoff, they successfully avoid the issue (iv) by constructing the future virtual 
utility from the estimate of the target states produced by a distributed filter. However, the approach 
may limit applications since there might be no explicit target in some scenarios. 

We finally mention the contribution of the present learning algorithm. The algorithm is 
regarded as a variation of [IS] and ^T4\. fll] guarantees that potential function maximizers are 
eventually selected with high probability. Meanwhile, [[T4l has advantages over [|3 that the action 
selection rule is simpler and convergence in probability is rigorously guaranteed, but it does not 
always lead sensors to potential function maximizers. The contribution of the present algorithm 
is to embody advantages of these two algorithms, i.e. guarantees convergence in probability to 
potential function maximizers while maintaining the simple structure of [[T4l . 

The contributions of this paper are summarized as follows: 

• a novel problem formulation of environmental monitoring for PTZ visual sensor networks 
taking account of the nature of vision sensors (ii) and (iii) is presented, 

• a novel simple payoff-based learning algorithm for potential games guaranteeing conver- 
gence in probability to the potential function maximizers is proposed, and 

• the approach is demonstrated through experiments, while such efforts are not always fully 
made in the existing works on game theoretic cooperative control. 

n. Visual Sensor Networks and Environment 
A. Visual Sensor Networks and Environment 

In this paper, we consider the situation illustrated in Fig. [H where n Pan-Tilt-Zoom (PTZ) 
vision sensors V = {vi, ■ ■ ■ monitor environment modeled by a collection of m polygons 
TZ = {ri, ■ ■ ■ ,rm}- Let the set of position vectors of all points in rj E IZ relative to a world 
frame S,^, be denoted by Qj. In the following, we also use the notation Q = Ur^enQj- 

Suppose that each PTZ vision sensor f i G V can adjust its horizontal (pan) angle 9i E 0i 
[— TT, 7r], vertical (tilt) angle v^i G <Pj C [0, vr] and focal length Aj G Ai (Fig. [2l), where 0i, <Pi and 
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Visual Mes5urement 





Image Plane 



Fig. 1. Targeted scenario 



Fig. 2. Actions of vision sensor Vi 



Ai are assumed to be finite sets. Throughout this paper, the notation 



ai = {9i, (p„ Xi) e Ai := O^x^.x Ai 



is called an action of sensor G V, and a = (aj)„,ev G ^ := x ■ ■ ■ x An is called a joint 
action. A collection of actions other than Vi is denoted as a_j := (ai, ■ ■ ■ , aj_i, Oj+i, ■ ■ ■ , a„). 

Once an action Oj is fixed, the orientation of sensor Vi's frame Sj relative to T.^ and its maximal 
view angle are uniquely determined, which are respectively denoted by Ri{ai) E 5*0(3) := {R G 
R^^^l R^R = Is, det{R) = +1} and f3i{ai). The position of the origin of relative to S^, is 
also denoted by pi G M^. Then, the pose of sensor Vi is represented as gi{a,i) = {pi, Ri{a,i)) G 
SE{3) := M'^ X 50(3). In this paper, we assume that each t>j G V is already calibrated and has 
knowledge on the pose gi{ai) for all G A, and Qj = {g G Aeqjq = 1, ^ieqj? < 1} for 
all G 7^, where 1 is a vector whose elements are all equal to 1. 

We also assume that, when each sensor Vi E V takes action a,; G Ai, the actions selectable 
at the next round are constrained by a subset Ci(aj) C Ai satisfying the following assumptions 
which are in general satisfied in the scenario of this paper. 

Assumption 1: The function Ci : Ai 2-^* satisfies: 

• For any Vi eV, E Ai and a[ E Ai, the inclusion a'- E Ci{ai) holds iff G Cj(a-). 

• For any Vi E V and any actions aj,a- G A;, there exists a sequence of actions = 
a\, af,---, a"^ = a'- satisfying a- G Ci(a-"-^) for all i G {2, ■ ■ ■ , Uf}. 

• For any f j G V and G Ai, the number of elements in Cj(aj) is greater than or equal to 3. 
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B. Visual Measurements and Communication Structure 

This subsection defines visual measurements of each vision sensor f « G V and communication 
structures among sensors in V. 

Let us first denote the pixels of vision sensor f « G V by iSj := {si^i\ I G {I,-'' j'S'j}}, and the 
position vector of the center of Sj ^ G Si relative to Sj by Pi^i{ai) (Fig. |3]). Then, once an action Oj 
is fixed, Vi obtains visual measurements (raw data) yi^i for each Sj ^ G Si. Here, Ui^i is a 3D vector 
for an RGB color image whose elements take integers in {0, • ■ ■ , 255}, and Ui^i G {0, ■ ■ • , 255} 
for a grey-scale image. Note that each yi i is provided by either of polygons Vj G IZ. 

In this paper, a point g G Q is said to be visible from sensor f « G V with action G ^ if 

atan(||[6,. hy\\\/h,) < A(ai), [K by := Rj{ai){q-pi) 

and there exists no pair of g' G Q and a G [0, 1) such that a{q — Pi) = q' — Pi- By using the 
notion, we also define the set of pixels of f j G V with G A; capturing Vj G IZ, which is 
denoted by Tij{ai) (Fig. S)). Here, a pixel Si^i G 5^ is a member of J'ij{ai) if and only if there 
exists a visible point q G Qj such that the point q projected onto the image plane is equal to the 
center of the pixel Sj,;. Due to the knowledge of gi{ai) and Qj, each f » G V can obtain J'ij^ai) 
for every G Ai. To be precise, a pixel Si^i is included in J'ij(ai) if rj satisfies 

aijicii) = -j p . \ . e (0,oo), Ainej [ai^j[a,)Ri[ai)pi^i[ai) < 1 (1) 

and the parameter aij(aj) is minimal among all polygons satisfying 
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In addition, a polygon rj G 7^ is said to be a visible polygon from sensor f » G V with G 
if \J-'ij{ai) \ > 1, where specifies the number of elements of a finite set J^. We also denote 
the set of all visible polygons from Vi with by 7^j(aj) C 7^. In addition, when a joint action 
a is selected, the set of sensors capturing rj as a visible polygon is denoted by Vj{a) C V. 

We also model the communication structure among sensors by an undirected graph G = (V, £) 
with £ CV xV. The set of all sensors whose information is available for t>j G V is also denoted 
as J\fi := {vj G V| G £}. In this paper, we use the following assumption. 

Assumption 2: A pair {vi,Vi>), i ^ i' satisfies {vi,Vi') G £^ if there exist Oj G Ai, ai' G Ai' 
and rj G 7^ such that rj G TZi{ai) fl TZi'{ai'). 

This assumption means that if any pair of two sensors can capture a common polygon then they 
need to communicate with each other, which is essentially similar to 091 and the only slight 
difference in description stems from whether multi-hop communication is taken into account. 

III. Global Objective and Utility Function 
A. Global Objective Function 

Let us formulate the global objective function W{a) G [0, oo) to be maximized by vision 
sensors V. For this purpose, we first introduce a function PVjj(aj) G [0, oo) evaluating the value 
of measurements 3^ij(ai) of sensor f j G V about polygon Vj G IZ. 

Let us assume that the function Wij{ai) relies on (a) how much information Vj G IZ contains, 
and (b) quality of the image. Formally, if the quantitative values of factors (a) and (b) are denoted 
by If^°{ai) G [0,oo) and /^^"^'(aj) G [0, oo) respectively, the function Wi^j{ai) is described as 

W,,{a,) = iy,,(/if (a,), I^]^\a,)). (2) 

In general, the function Wij{-, ■) is non-decreasing with respect to both If^-°{ai) and If"^^{ai), 
and the equation P^j°{ai) = I^'^^^lai) = holds if Vj is not visible from Vi, i.e. Vj ^ IZi^ai). 

The functions Ifj"{ai) and /f^^^(aj) G [0, oo) respectively play roles similar to the density 
function and the sensing performance function in coverage control [fT6l . [fTTl . However, there are 
some differences. Since vision sensors do not provide apparent physical quantity like temperature 
or pressure, we need to extract the amount of information contained in the raw data 3^ij (at), 
which makes the selection of If^-°{ai) non-trivial. However, fortunately, there are rich literature 
on information extraction from visual measurements, and we can freely choose one of them 
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depending on the targeted scenario. Some examples will be shown in the next subsection. 
However, such quantities can be extracted after gaining the visual measurement and, moreover, 
the function I™J°{ai) is dependent on the state of the highly uncertain environment regardless 
of its selection. Hence, we cannot assume availability of the value I™-J°{ai) prior to execution 
of ttj. Due to the problem, we will present a solution using only the past experienced values of 
If^-°{ai) and I1J^^{ai) in the subsequent sections. 

For visual sensor networks, the quality If^^^{ai) is determined not only by the distance between 
Vi and Tj but also by their relative pose and the focal length Aj, which makes the function 
complex. However, since the present solution does not require to model the function differently 
from coverage control [fT6l . [fTTI . it is sufficient to evaluate the quality after gaining the image. 
For example, using the fraction over the image that occupies as I'^^^^{cii) = f^'^^^{\J^ij{ai)\/ Si) 
with an increasing function /^"^i satisfying f^^^^O) = can be a useful option. 

We next consider the reward Wj{a) E [0, oo) provided from environment rj to not a single 
sensor Vi but the visual sensor network V. We assume that Wj{a) is a function of Wij{ai) only 
for vision sensors in Vj(a) capturing rj as 

W^,(a) = W^,((l^,,(a.)kgv,(a)) (3) 

with Wj{a) = if Vj{a) = and that vision sensors V share the information of the function 
Wj. In the following, we show only two typical selections of such functions. The first option is 

Wj{a) = max Wij{ai) (4) 

imposing no value on the information 3^jj (aj) of sensor Vi if other sensor has better measurements 
on Tj. The second option is to employ the function 

W,{a) = h(^ ^ l^„,(a,)) (5) 

for a monotonically increasing concave function h with h(0) = 0. This function weakly accepts 
the value of the measurement which is not the best among the sensors. 

The goal of this paper is to present a cooperative/distributed action selection algorithm leading 
vision sensors V to a joint action a maximizing the global objective function defined by 

r,G7^ 

under the constraint that I™-°{ai) and I'^Y^{ai) are available only after vi executes an action a,. 
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ri 



Fig. 5. Initial image 



Fig. 6. Current image 



Fig. 7. Outputs of function /° 



B. Examples of Function If^-'^ 

In this subsection, we will introduce examples of the function I™J°{ai). 
We first consider a normally static environment, and suppose that we are interested only in 
whether or not each pixel Si^i captures environmental changes. The requirement is reflected by 



0, if WVi. 



7/0 I 



< 



1/threshold 



(7) 



where y^Aa^ 



rinfo 



leT,,j{ai) [ 1' if - Z/ijII > Z/thrcshold 

= {yi,i}ieT^,j{a,) is the stored initial image and ^threshold is a positive scalar. A 



small Ilj{ai) means that no serious event occurs at around Vj, while a large Ifj{ai) indicates 
some environmental changes. For example, suppose that the initial image for an action is 
given by the gray scale image in Fig. |51 and the current measurement with the same aj is the 
image in Fig. [6l where light is shined only in a part of the image. Then, the outputs of the 
function 1^1 with |/thrcshoid = 20 are given as Fig. |7l where the black and white areas correspond 
to 1^1 = and 1^1 = 1, respectively. If two resources ri and r2 are captured as in Fig. Ul then 
ri including environmental changes must provide a larger Ifj°(ai) than r2. 

We next consider the situation where a visual sensor network monitors the sky to help 
prediction/estimation of the solar radiation via remote sensing from a satellite [|27l . Then, the 
image data of both the bright blue sky and the cloud contain little information since such 
information can be provided by the low resolution data from a satellite. Namely, it is desirable 
for vision sensors to provide images capturing the borders between blue and cloudy sky. A metric 
to measure such amount of information is the image entropy [l26l . For example, let us assume 
that a resource provides Fig. |8] and the other resource provides Fig. |9l Then, the image entropy 
of Image 1 after a gray-scale processing is equal to 7.506 while that of Image 2 is 6.269. As 



Febraary 11, 2013 



DRAFT 



10 




expected, Image 1 containing both of the blue sky and the cloud provides a larger entropy. 

The other option is to run some existing cloud detection algorithm as in [|28l and to count the 
number of pixels corresponding to the comer as If^J". The output of ll28l is illustrated by red 
dots in Figs. \W\ and [TTl and Figs. \W\ and [TT] provide I^^J" = 1557 and = 63, respectively. 

C. Utility Design and Potential Games 

We next design a utility function Ui{a) which vision sensor f j G V basically tries to maximize. 
Here, we use the marginal contribution utility [|3l, 01 for the global objective ^ as 

Ui{a) = W{a) -W-\a), (8) 

where W~'^{a) is equal to the global objective W in the case that Vi views no polygon and 
the other sensors take actions a_j. Then, collecting all factors V, A, {Ci(-)}t,.gVj and the utility 
functions {^i(-)}t)i6V in (O, we can define a constrained strategic game 

r = (V, A {[/.(■) Wv,{C.(-)kev). (9) 
We next introduce the following terminologies. 
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Definition 1 (Constrained Potential Games /15]/, KT4\l}: A constrained strategic game F is said 
to be a constrained potential game with potential function : ^ — )■ M if for all Vi E V, every 
tti E Ai and every a_j E HiY*'^*'' following equation holds for every a'^ E Ci{ai). 

Ui{a[, a_j) - Ui{ai, a_i) = 0(a-, a_i) - (f){ai, a_i) (10) 

Definition 2 (Constrained Nash Equillibria SSjH, For a constrained strategic game F, a 

joint action a* G ^ is said to be a constrained pure Nash equilibrium if the equation Ui{a*, a*_j) = 
maxa^gc,(a*) Ui{ai, a*_i) holds for all Vi E V. 

Then, it is well known that any constrained potential game has at least one Nash equilibrium 
and the potential function maximizers must be contained in the set of Nash equilibria [[31, [ f T4 | . 
In addition, we have the following lemma from the feature of the marginal contribution utility. 

Lemma 1: [4J The strategic game F in ([91) with ([8]) constitutes a constrained potential game 
with potential function (p equal to the global objective function W. 

In the remaining part of this section, we clarify a computation procedure of the utility function 
Ui{a) after a joint action a is determined. The quantities I^^J" and I'j^j^^ for any rj E 7li{ai) must 
be locally computed since they evaluate the image information 3^jj(ai) itself. From PFjj(aj) 
is also locally computable at Vi E V . In addition, we can prove the following lemma. 

Lemma 2: Under Assumption |2l the utility function f/i(a) in ([8]) for any fixed a G ^ is 
uniquely determined for all f j G V if the information 

is available for vision sensor Vi. 

Proof: We first define W~\a) which is equal to the value of Wj{a) when Vi views no 
polygon and the other sensors take actions a_j. Then, Equation ^ implies that 

since Wj{a), Vj ^ Tli{ai) is independent of Wij{ai), Vi ^ Vj{a) from dS]). ^ and (fTTI) also 
mean Ui is determined by {W^',j(ai/)}„^,6v,(a),r,e7^,(a,)• As sumption |2] implies that y^r,endH)^M) 
must be included in Mi for any a E A, which completes the proof. ■ 
Lemma [21 and the knowledge of Wj mean that Ui is computable in a distributed fashion in the 
sense of graph G. More importantly, if Vi just needs to feedback Ui{a) for a fixed joint action a. 
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he has only to locally execute the image processing, which is in general the hardest process in 
the monitoring task, and to exchange the compacted information 1^^°™ through communication. 

Hereafter, we use the following assumption, which is not restrictive since it is satisfied by just 
scaling the global objective function W appropriately. 

Assumption 3: For any {a, a') satisfying a- G Cj(ai) and a_j = a'_-, the inequality Ui{a') — 
Ui{a) < 1/2 holds for all Vi E V. 

IV. Learning Algorithm 

Since the potential function is equal to the global objective function W (Lemma [T]), the only 
remaining task is to design an action selection rule determining ai{k) at each round A; G Z+ : = 
{0, 1, 2, ■ ■ ■ } such that the joint action a{k) is eventually led to the potential function maximizers. 
Note that due to the constraint that I™J'^(ai) is available only after an action Oj is executed and 
the communication constraints specified by graph G, ai{k) must be determined based on the past 
actions {ai{k')}k'<k-i, visual measurements {{yii{k'))i^s-}k'<k-i and communication messages 
{(/jv°™(A;'))^.,g^.}fc/<,fc from neighbors in A/^. Now, we see from Lemma |2] that an algorithm 
determining ai{k) based on the past actions and utilities {ai(k'),Ui{a{k'))}k'<k-i nieets the 
requirement, and such algorithms are called payojf-based learning |[5l, [[T4|. 

In this paper, we present a learning algorithm Payoff-based Inhomogeneous Partially Irrational 
Play (PIPIP) based on the algorithm in [[TU. In the algorithm, every vision sensor f j G V 
chooses his own action ai{k) concurrently at each round k E using only the past two actions 
ai{k — 2), ai{k — l) and utilities Ui{a(k — 2)), Ui{a{k — 1)) stored in memory based on the policies 
called exploration, exploitation and irrational decision. 

Initially, each sensor Vi eV executes an action Oj randomly (uniformly) chosen from Ai and 
feedbacks the resulting utility Ui{a). Then, set aj(0) = aj(l) = and t/j(a(0)) = f/j(a(l)) = 
Ui{a). At round A; > 2, if Ui{a{k — 1)) > Ui{a{k — 2)) holds, then every sensor f j G V chooses 
action ai{k) concurrently according to the rule: 

• (exploration) ai{k) is randomly chosen from Ci{ai(k — 1)) \ {ai{k — 1)} with probability e, 

• (exploitation) ai{k) = ai{k — 1) with probability 1 — e, 

where the parameter e E (0,1/2] is called an exploration rate. Otherwise (Ui{a{k — 1)) < 
Ui{a{k — 2))), action ai{k) is chosen according to the rule: 
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Algorithm 1 Payoff-based Inhomogeneous Partially Irrational Play 
Initialization: Action is chosen randomly from Ai. Set a} ^ 

Ui{a), Ul ^ Ui{a), Ai ^ for all Vi G V and A; ^ 2. 

Step 1: Update e, if necessary. 

Step 2: If Ul>Uf, then 



Otherwise, 



tmp , , 



in(i{Ci{a]) \ {a\}), with probability e 
a\ , with probability 1 — e 

in(\{Ci{a]) \ {a],af}), with probability e 
a], with probability (1 — e){K ■ e'^') 



with probability (1 — £)(! 



K ■ e 



Step 3: Execute the selected action a^^. 

Step 4: Compute Utility ?7j(a*™P) by the following procedure: 

Step 4.1: Feedback the visual measurements {yi,i}si&Si^ extract (1™'^°,/?"^') from the 
measurements, and calculate Wi^j for all Vj E Tli{af^^). 

Step 4.2: Set /^°"^ ^ (W^i,i(ar^)> J)r,G7^,(a*-p) and send it to A^^. 

Step 4.3: Receive (/r°™)i;^,GM and compute utility f/i(a*™P). 
Step 5: Set af ^ 4, ^ af"^, Uf ^ U} , U} ^ Uiia'^^^) and A, ^ - U} . 
Step 6: A; ^ A; + 1 and go to Step 1. 



• {exploration) ai{k) is randomly chosen from Ci{ai{k — 1)) \ {ai{k — l),aj(A; — 2)} with 
probability e, 

• {exploitation) ai{k) = ai{k — 2) with probability (1 — e){l — ks^^^'^^), where Aj(A;) := 
Ui{a{k - 2)) - U,{a{k - 1)) and k E [0, 1/2], 

• {irrational decision) ai{k) = ai{k — 1) with probability (1 — e)Ke^^'^'^\ 

It is clear under the third item of Assumption [T] that the action ai{k) is well-defined. 

Finally, each Vi executes the selected action ai{k) and computes the resulting utility Ui{a{k)) 
by the procedure stated at the end of Section Illl At the next round, sensors repeat the same 
procedure. 

The procedure of PIPIP associated with sensor G V is compactly described in Algorithm [U 
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where the function rnd(^') outputs an action chosen from the set A' according to the uniform 
distribution. The important feature of PIPIP is to allow vision sensors to make the irrational 
decisions at Step 2. Indeed, Algorithm [T] with k = is the same as the algorithm in [[T4l . 

Let us first consider Algorithm [T] with a constant e G (0, 1/2] skipping Step 1, which is called 
Payoff-based Homogeneous Partially Irrational Play (PHPIP) in this paper. Then, we have the 
following theorem, which will be proved in the next section. 

Theorem 1: Consider a constrained strategic game F in (|9l) with ([8]) satisfying Assumptions 
[I]and[3l and sensors following PHPIP. Then, given any probability p < 1, if the exploration rate 
e is sufficiently small, for all sufficiently large k, the following equation holds. 



Prob 



a{k) G arg max 0(a) 



>p. (12) 



Theorem [T] ensures that the optimal actions maximizing the global objective W are eventually 
selected with high probability (Lemma [U) if the exploration rate e is sufficiently small. However, 
it is difficult to reveal a quantitative relation between the probability p and e in (fT2l) . 

We next consider Algorithm [T] with the following update rule of e at Step 1 similarly to (T4\. 

e{k) = k'<^), (13) 

where D is defined as D := max^.gy-Di and Di is the minimal number of steps required for 
transitioning between any two actions of Vi. Then, the following theorem holds. 

Theorem 2: Consider a constrained strategic game F in ([91) with ([8]) satisfying Assumptions 
[U and [31 Suppose that each vision sensor obeys Algorithm \T\ with (fT3l) . Then, if the parameter 
K is chosen so as to satisfy 

'' C := max max |Cj(aj)|, (14) 



"^•,C-1'2 
the following equation holds. 



lim Prob 

k—^oo 



a{k) G argmax(/)(a) 

aeA 



(15) 



From Lemma [H Equation (fTSi) means that the probability that vision sensors take one of the 
global objective function maximizers converges to 1. 

The statement of Theorem [T] is compatible with [5J. The contribution of PIPIP is simplicity of 
the action selection rule and the convergence result in Theorem |2] similarly to [[T4l . Meanwhile, 
the concurrent version of the algorithm in [fT4l ensures convergence in probability to Nash 
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equilibria but does not guarantee convergence to potential function maximizers i.e. the global 
objective function maximizers in the context of this paper. In contrast, thanks to the irrational 
decisions, PIPIP leads sensors to the potential function maximizers. Namely, PIPIP embodies 
the desirable features of both algorithms in [5 | and [14]. 

Remark that PIPIP guarantees the above theorems even in the presence of the action constraints 
specified by the set Cj. This allows one to take account of physical constraints of PTZ cameras. 
More importantly, constraints can be useful as a design parameter. Indeed, depending on the 
scenarios, persistent possibility of explorations toward all elements of Ai can lead to volatile 
behavior of the global objective function in the practical use of the learning algorithm. In such 
a situation, adding some virtual constraints works for stabilizing the evolution. 

Note that both of the above theorems address static games, which indicates that the present 
algorithm works in the scenarios of monitoring normally static environment including sudden 
changes since the environment before/after the change are both static. In addition, application 
of the conclusions in [|29l to Theorem [T] means that the present algorithm successfully adapt to 
the gradual environmental changes as long as the speed of the dynamics is sufficiently slow. 

V. Stability Analysis 
A. Preliminary: Fundamentals of Resistance Tree 

Let us consider a Markov process defined over a finite state space X. A perturbed 

Markov process {P^}, e E [0, 1] is defined as a process such that the transition of {P^} follows 
{P°} with probability 1 — e and does not follow with probability e. In particular, we focus on 
a regular perturbation defined below. 

Definition 3 (Regular Perturbation /fJOl/).- A family of stochastic processes is called a 

regular perturbation of {P^} if the following conditions are satisfied: 
(Al) For some e* > 0, the process {P^} is irreducible and aperiodic for all e E {0,s*]- 
(A2) Let us denote by P^i^2 the transition probability from E X io E X along with the 
Markov process {P^}. Then, lim£_^o-P^ia.2 = -P°i^.2 holds for all E X. 

(A3) If P^i^2 > for some e, then there exists a real number xi^^ x"^) > such that 

P'l 2 

lim,^o^5^(^^ e (0,oo), (16) 
where — ?■ x^) is called resistance of transition x^ — )■ x^. 
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We next consider a path p from x E X to x' E X along with transitions x^^^ = x x*^^^ — t- 
. . . _v. 3;(m) _ ^/ denote by P^(p) the probability of the sequence of transitions. Then, 
resistance x{p) of a path p is defined as the value satisfying 

lim — )^G(0,oo). (17) 
Then, it is easy to confirm that x{p) is simply given by 

m— 1 
i=l 

A state a; G A:' is said to communicate with state x' E X ii both x x' and x' --^ x hold, 
where the notation x x' implies that x' is accessible from x i.e. a process starting at state 
X has non-zero probability of transitioning into x' at some point. A recurrent communication 
class is a class such that every pair of states in the class communicates with each other and no 
state outside the class is accessible from the class. Let Hi, - ■ ■ , Hj be recurrent communication 
classes of unperturbed Markov process Then, within each class, there is a path with zero 

resistance from every state to every other. In the case of a perturbed Markov process {P^}, there 
may exist several paths from states in Hi to states in Hi/ for any two distinct classes Hi and Hi/. 

We next define a weighted directed graph Gr = {H, Er, W/?) over the recurrent communica- 
tion classes V. = {Hi, ■ ■ ■ , Hj} of where the weight wu/ E Wr of each edge {Hi, Hi/) 
is equal to the minimal resistance among all paths over {P^} from a state in Hi to a state in 
Hi/. We also define l-tree which is a spanning tree over Gr with root Hi such that, for every 
Hi/ 7^ Hi, there is a unique path from Hi/ to Hi. The resistance of an l-tree is the sum of the 
weights of all the edges over the tree. The tree with the minimal resistance among all /-trees is 
called the minimal resistance tree, and the corresponding minimal resistance is called stochastic 
potential of Hi. Let us now introduce the notion of stochastically stable state. 

Definition 4 (Stochastically Stable State h30\l): A state x E X h said to be stochastically 
stable, if x satisfies lim£_j.o+/i2^(£:) > 0, where Px{^) is the value of an element of stationary 
distribution p{e) corresponding to state x. 

Then, we can use the following result linking stochastically stable states and stochastic potential. 

Lemma 3: [|30l Let {P^} be a regular perturbation of {Pj^}. Then lime^o+/^(^) exists and the 
limiting distribution is a stationary distribution of {P°}. Moreover the stochastically stable states 
are contained in recurrent communication classes of {P^} with minimum stochastic potential. 
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© 2* = (a*, a*), (7*: Potential Function Maximizer © a* = (a*,a'), a*: Potential Function Maximizer 

© z = (a, a) O 2 = (ai, 02), ai / a2 © z = (a, a) 2 = (ai, 02), ai / 02 

Fig. 12. Image of Markov process {P"} Fig. 13. Image of Markov process {P'^} (Transitions colored 

by red can happen only when e 7^ 0) 



B. Auxiliary Results 

We first consider PHPIP with a constant exploration rate e. Then, the transitions of the state 
z(k) = {a{k — l),a{k)) for PHPIP are described by a perturbed homogeneous Markov process 
{P^} on the state space B := {(a, a') G ^ x A\ a- G Cj(aj) Vi G V}. In the following, we use 
the notation diag(^') = {{a,a) e A^i A\ a e A!}, A! A similarly to f[T4l . 

In terms of the Markov process {P^} induced by PHPIP, the following lemma holds. 

Lemma 4: The Markov process {P^} induced by PHPIP applied to the constrained strategic 
game F in Q with dS]) is a regular perturbation of under Assumption [Tl 

Proof: See Appendix lAl ■ 
From Lemma m the perturbed process {P^} is irreducible and aperiodic, and hence there exists 
a unique stationary distribution ii{e) for every e. We also see from the former half of Lemma |3] 
that lime_^o+ exists and the limiting distribution is the stationary distribution of {P°}. 

We also have the following lemma. 

Lemma 5: Consider the Markov process {P^} induced by PHPIP applied to the constrained 
strategic game F in (|9l) with ([8]). Then, the recurrent communication classes of the unperturbed 
Markov process are given by elements of diag(v4) = {{a,a) E A^ A\ a E A}, namely 

H, = {{a\a')}, . = 1,---,|^|. (19) 

Proof: See Appendix |Bl ■ 
The lemma means that all the paths over the Markov process eventually reach and remain 
at a state such that a{k — l) = a{k) as illustrated in Fig. [12] Meanwhile, the process {P^}, £ 7^ 
contains paths traversing two of such states as in Fig. [131 
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{a\a^) (a\a^) (a^a2) 

1 j^-^ or A,; 

• ^© 

a(fe) = = ( a,! , aL. ) , 
a(fc + 1) = a(/i: + 2) = = ( 0.7 , oL, ) 

Fig. 14. Straight route (the numbers above the arrows describe resistances of transitions) 

We next introduce the following terminology, where we use the notation 

£s := {{z = {a,a),z' = {a, a')) G diag(^) x diag(^)| 

3f j G V s.t. a, G Cj(a-), 7^ a- and a_j = a'_^}. (20) 

Definition 5 (Straight Route): A feasible path over {P^} from 2;^ = (a\a^) to z"^ = (a^,a^) 
such that (z^, z"^) G is said to be a straight route if the path describes the two rounds transitions 
that only Vi satisfying a] 7^ of chooses ai{k + 1) = through exploration at the first round and 
he also chooses ai{k + 2) = at the second round while the other sensors do not update their 
actions (Fig. [T4|) . In addition, a feasible path over {P^} from G diag(^) to z"^ G diag(^) is 
said to be an M -straight-route if the path contains M nodes in diag(^) including z^ and z^, 
visits the M nodes only once, and any path between such nodes are straight routes. 

In terms of the straight route, we have the following lemmas. 

Lemma 6: Consider paths from any state z^ = {a^,a^) G diag(^) to any state z"^ = (a^,a^) G 
diag(^) such that {z^,z'^) G £s over the process {-P^} induced by PHPIP applied to the game 
r in do]) with ([8]). Then, under Assumption |3l the resistance x{p) of the straight route p from z^ 
to z'^ is strictly smaller than 3/2 and xip) is minimal among all paths from z^ to z"^. 

Proof: See Appendix O ■ 

Lemma 7: Consider the Markov process {P^} induced by PHPIP applied to the game T in 
dH) with ([8]). Let us describe an M- straight-route p from state z'^ = (a^, a^) to state z"^ = (a^, a^) 
as p : 2« = ^1^2(2)^... ^ ziM-i)^^(M) ^ ^2^ ^(/) _ (flW^aW) ^ diag(^),/ G 

{1, ■ ■ ■ , M} and all the arrows between them are straight routes. In addition, we consider the 
(reverse) M-straight-route p' : z^-^^ = z^^z^'^^^ ■ ■ ■ •^z'^^'^^^^ -^z'^^'^^ = z^ from z^ to z^ . Then, 
under Assumption |3l if 0(a°) > the inequality x(p) > x(p') holds true. 
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Fig. 15. Graph Gr Fig. 16. Illustrative example of a two player game 

Proof: See Appendix iDl ■ 

C. Proof of Theorems 
Proof of Theorem [7] 

Let us fomi the directed graph Gr = {I-L^Sr^Wr) as in Subsection IV-AI over the recurrent 
communication classes for the unperturbed Markov process {P^} induced by PHPIP (Fig. [T5h . 
From (fT9l) . the node set 1-L of the graph Gr is given by diag(^). Since all the recurrent 
communication classes have only one element from (fT9l ). the weight of the edge for any two 
states and E diag(^) is simply given by the path with the minimal resistance among all 
paths from to over {P^}. In addition, Lemma [3] proves that if {z^, z"^) E Sg defined by (|20l) . 
the minimal weight is given by the straight route from z^ to z"^ . For instance, let us consider a 
two player game with Ai = {a\, } and A2 = {al, a^}. Then, graph Gr is illustrated as in Fig. 
[T6l where only the edges colored by blue are contained in £d := £r\ £s and have resistance 
greater than 2 since both of two players have to take exploration to escape from the state. 

Let us focus on /-trees over Gr with a root Hi = z'- E diag(^). Recall now that the resistance 
of the tree is the sum of the weights of all edges constituting the tree as defined in Subsection 
IV-AI Let us now consider a tree for the graph in Fig. [16] containing an edge in £d (Left figure 
of Fig. [TtI) . Then, it is easy to confirm that a tree with a smaller resistance can be formed by 
replacing the edge in £d by an edge in £s as illustrated in the right figure of Fig. [17] From this 
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Fig. 18. Resistance trees (the red node is tlie state z* = 
{a*, a*) witli tiie potential function maximizer a*) 

example, we have a conjecture that the minimal resistance tree consists only of edges in Eg. The 
following lemma proves that the conjecture is true for a general case with n sensors. 

Lemma 8: Consider the weighted directed graph Gr constituted from the Markov process 
{P^} induced by PHPIP applied to the constrained strategic game F in ([91) with ([8]). Let us denote 
by T = (diag(^), W;) the minimal resistance tree with root G diag(^). If Assumptions [U 
and [3] are satisfied, then the edge set £i must be a subset of Eg. 

Proof: See Appendix |El ■ 

We are ready to prove Theorem [T] It is now sufficient to prove that all the stochastically 
stable states of {P^} are included in diag(argmaxag^ 0(a)), since the probability of a(k) G 
argmaxag_4 0(a) is greater than the probability of ^(A;) = (a(A;— 1), a(A;)) G diag(argmaxag_4 0(a)) 
We also see from (fT9] ) and Lemmas |3] and |4] that we need only to prove that the states in diag(^) 
with the minimal stochastic potential are included in argmax^g^ 0(a). 

We first introduce the notations z' = (a', a') G diag(^) with a' ^ argmax^g^ 0(a) and 
z* = {a*, a*) G diag(^) with a* G arg max^g^ 0(a). Let the minimal resistance tree for the 
state z' be denoted by T. Then, there exists a unique path p from z* to z' over T. From Lemma 
m the path p corresponds to an M- straight-route for some M. Now, we can build a tree T' with 
root z* such that only the path p is replaced by its reverse path p' (Fig. [TSl) . Then, we have 
x(p) > x(p') from Lemma |7] since 0(a*) > 0(a'). Thus, the resistance of T' is smaller than 
that of T and the stochastic potential of z* is smaller than or equal to the resistance of T'. The 
statement holds regardless of the selection of a'. This completes the proof. 
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Proof of Theorem |2] 

Let us next consider PIPIP with time-varying e{k) and first prove strong ergodicity of the 
inhomogeneous Markov process {-P|} induced by PIPIP Here, a Markov process {Pa,} over a 
state space X is said to be strongly ergodic [[3Ti if there exists a stochastic vector jj* such that 
the following equation holds for any distribution fi on X and time k^. 

k-l 

\imk^^fiP{ko, k) = /i*, P{ko, k) := JJ Pk', < ko < k. (21) 

k'=ko 

If {Pk} is strongly ergodic, the distribution ji converges to the unique distribution /i* from any 
initial state. Meanwhile, the process {Pk} is said to be weakly ergodic ||3TI if the following 
equation holds for all E X and k^ G Z+. 

lim (P^.i^3(A;o, k) - P^2^3{ko, k)) = 

Here, we also use the following lemmas. 

Lemma 9: [[3T| A Markov process {Pk} is strongly ergodic if the following conditions hold: 
(Bl) The Markov process {Pk} is weakly ergodic. (B2) For each k, there exists a stochastic vector 
/i'^ on X such that yu'^ is the left eigenvector of the transition matrix P{k) with eigenvalue 1. (B3) 
The eigenvector fi'^ in (B2) satisfies J2xex ll^x^ l^x^^l < Moreover, if fi* = limfe^oo Z^*, 

then /i* is the vector in (|2TI) . 

Lemma 10: [|3TI A Markov process {Pk} is weakly ergodic if and only if there is a strongly 
increasing sequence of positive numbers k^, t G Z+ such that 

oo 

y] min y^mm{P^i^{k,,k,+i),P^2^{k,,k,+i)} = oo 

t=0 ' xdB 

We next prove strong ergodicity of Conditions (B2), (B3) in Lemma |9] can be proved 

in the same way as [|T4l . We thus mention only Condition (Bl). Recall now that the probability 
of transition — )• is given by (|24)) . Since eik) is strictly decreasing, there is Ajq > 1 such 
that k^ is the first round satisfying 

F{k\ F(k)(^-^^') 

(1 - e{k)){l - Ke{k)^^) > 1 - e{k) > (22) 

The existence of e satisfying (l22l) is guaranteed by (fT4l) . For all k > ko, we have 

e{k) 
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Projected Irnages 




Fig. 19. PTZ network camera Fig. 20. Overview of scenario 



We next define Xz = aigmiiixeB Pxzik, k + D + 1) for any z ^ B. Then, similarly to [[Ml, 

je{k) e(k+D-l) pe{k+D) ^ / ^{k) \ "(^+1) 



Following [[m again, we have the following inequality with A;^ = (Z) + l)i and (Z) + l)6o ^ ^o- 
min V'min{P^i^(A;„ A;,+i),P,2^^(A;„ A;,+i)} > (— — 

t=0 ' zGB '.='-0 



>^ 1 



(C - ^ (D + 1)6 

t=to 

This inequality and Lemma [10| prove (Bl) and hence strong ergodicity of {-P^}. Thus, the 
distribution fi(e{k)) converges to the unique distribution /i* from any initial state. In addition, 
we also have fi* = fi(0) = lime_j.o/i(e) from limfc_i.oo£^(fc) = 0. We have already proved in 
Theorem [T] that any state z satisfying /iz(0) > must be included in diag(argmaXaG^ 0(a)). 
Hence, ([T5l) holds and the proof of Theorem [2| is completed. 

VL Experimental Case Study 

We finally demonstrate the effectiveness of the presented approach through experiments on a 
testbed of PTZ visual sensor networks consisting of 5 PTZ cameras V = {vi, f 2, ^"3, f4, f 5} (Fig. 
[19]), where two of them (vi and V2) are IPELA SNC-EP520 (SONY Corp.) and the other three 
(f3, V4 and V5) are IPELA SNC-RZ25N (SONY Corp.). Note that the size of the acquired images 
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Fig. 21. Sample image Fig. 22. Projected image until 1500 round Fig. 23. Projected image after 1500 round 

is 640 X 480 (Si ~ 3.0 x 10^) for every Vi E V. In this experiment, all the algorithms including 
image processing and the learning algorithm are run via Visual C++ (Microsoft Corp.). 

Let the five cameras monitor a ceiling of a room on which an image is projected as illustrated 
in Fig. [20l Namely, we regard the ceiling as the environment and divide it into 130 squares 
TZ = {ri, ■ ■ ■ ,ri3o}, 10cm on a side. Note that sensors vi,V2,V3,V4,V5 are respectively marked 
by purple, yellow, cyan, blue and red circles. 

The action sets Ai = 0i x <Pi x Ai are set as 

01 = {(5/180)7™^! ne E {-34, -33, ■ ■ ■ , 34}}, 

Oi = {(5/180)7^^1 ng e {-2, -1, ■ ■ ■ , 20}}, ^ = 2, 3, 4, 5, 

$i = {(5/180)7rn^| G {15, 16, ■ ■ ■ , 18}}, i G V, 

Ai = A2 = {6.8mm, 13.6mm}, A^ = A4 = A^ = {8.2mm, 16.4mm}. 

Just to stabilize the evolution of the objective function, we introduce the constrained action sets 

Ma,) = {(^:,<^:,A:)| \e',-e,\ < (5/180)7r, \ip[-v,\ < (5/180)7r, A, G A} 

for all G Ai and Vi G V, which clearly satisfies Assumption [T] 

The global objective function W and utility function Ui are selected as follows. Suppose that 
each sensor Vi stores in memory a part of the sample image in Fig. 1211 corresponding to each 
action in Ai. Then, we employ ^ as I™J°{ai). Since ^ inherently embodies the function of 
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1000 2000 3000 4000 
round 



Fig. 24. Evolution of global objective function 



I^^^^{ai), this experiment does not use I'j^J'^^{ai), and the function Wij in © is chosen as 



7, if r, e 7^,(a,) and (a,) < 7- • (23) 



2/^°'°(a,), if r, G 7^i(a,) and 2M%ai} > 7 



0, if Tj ^ TZiitti 



The positive parameter 7 > is introduced to place value of monitoring a region containing 
no useful information in preparation to future environmental changes. In this experiment, we 
set 7 = 1.5 X 10^^. We next define Wj and by dH) and and then scale them so that 
Assumption [3] is satisfied. Such a scaling is possible since the maximal value of Wij in (l23l) 
can be easily estimated. Finally, the utility function f/j is designed according to ([8]). 

In this experiment, we first project the image in Fig. |22]on the environment, which differs from 
the sample image in Fig. [2T] in that a small hole appears. Then, we run the learning algorithm 
with e = 0.015 and k = 0.120 for 1500 rounds. Note that all the sensors initially choose zoom-in 
mode (the larger Aj). After that, we change the image projected on the environment to the image 
in Fig. [23] with a larger hole, and leave the state for 2500 rounds. Due to the nature of the 
objective function, it is intuitively desirable that sensors capture the holes with high resolution 
while keeping the total coverage area as wide as possible. 

The experimental results are shown in Figs. |24l |25]and |26l Fig. [24] illustrates the evolution 
of the global objective function. We can confirm from this figure that the actions are basically 
selected so as to maximize the global objective function. Figs. [25] and [26] show the snapshots of 
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(a) Initial State 










(b) 200 round 


-* 


L 



















(c) 280 round 

Fig. 25. Snapshots of coverage area and acquired images for Fig. 1221 



(d) 750 round 



the coverage area and the acquired images at the times marked on Fig. |24l where green boxes 
on the top left (large) pictures describe the field of views. 

In Fig. [25lb). sensor (red) widely covers the environment by choosing zoom-out mode 
(A2 = 6.8mm) and (blue) captures the half of the hole, with zoom-in mode, which drive up 
the global objective function. We see from Fig. l25l c) that t>2 (yellow) covers the remaining half of 
the hole and also covers the unmonitored area, which also increases the objective function. Then, 
after a while, they reach a desirable configuration in Fig. l25l d). where the hole is monitored by a 
sensor in zoom-in mode and the remaining sensors achieve wide-ranging coverage by choosing 
zoom-out mode while avoiding overlaps of field of views. 
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(a) 1500 round (b) 2700 round 




(c) 3000 round (d) 4000 round 

Fig. 26. Snapshots of coverage area and acquired images for Fig. [23] 

Fig. [23a) illustrates the configuration at the time when the image in Fig. [23] starts to be 
projected. We see from Fig. [26l b) that (red) monitors the hole in zoom-in mode, and from 
Fig. [26lc) that f 3 (cyan) also takes a similar action. Then, after the pan and tilt angles are finely 
tuned to avoid overlaps, they eventually reach the desirable configuration depicted in Fig. [26l d). 

All the above results show the effectiveness of the present approach. Let us finally emphasize 
that the ideal results are achieved without using any prior information of environmental changes 
on when, where and how the changes occur. 
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VII. Conclusions 

In this paper, we have investigated a cooperative environmental monitoring for PTZ visual 
sensor networks and presented a distributed solution to the problem based on game theoretic 
cooperative control and payoff-based learning. We first have presented a novel optimal envi- 
ronmental monitoring problem. Then, after constituting a potential game via an existing utility 
design technique, we have presented a payoff-based learning algorithm based on [|6l so that 
the vision sensors are led to not just a Nash equilibrium but the potential function maximizes. 
Finally, we have run experiments to demonstrate the effectiveness of the present approach. 

The authors would like to thank Mr. S. Mori for his contributions in the experiments. 

Appendix A 
Proof of Lemma [4] 

Condition (A2) in Definition [3] is straightforward from the structure of PHPIP. We thus prove 
only (Al) and (A3) below. 

Consider a feasible transition — )■ with = (a°,a^) G B and z"^ = (a\a^) G B, and 
partition the set of sensors V according to their behaviors along with the transition as 

Al = {v, G V| U,{a') > f/,(a°), G C,{a]) \ {a]}}, A2 = K e V| Ui{a') > f/,(a°), = a]}, 
A3 = {v, G V| U,{a') < U.ia''), a\ G C,(a^) \ {a", a.^}}, 

k^ = {vi G V| Vi{a^) < f/i(a°), = a}}, A, = {vi G V| Ui{a') < Ui{a^), a\ = a°}. 
Then, the probability of transition z^ z^ is described by 

= n ^1^, n (1 -) n n ^ - ^)-" n ^ -xi - -"x^^) 

where (5^ = 1 if a° = a\ and bi = 2 otherwise. We see from (|24]) that the resistance — ?■ -2^) 
of transition — )■ defined in (fT6l ) is equal to |Ai| + IA3I + X]ieA4 ^« since 

Pl^-A TT 1 TT 1 



< lim 



rA^5 ,|A.|-MA3|+E..., A. n _ 1 H _ z,^'^'^^' < ^ (25) 

holds. Thus, (A3) in Definition [3] is satisfied. 

Let us next check (Al) in Definition^ From the rule of taking exploratory actions in Algorithm 
1 and the second item of Assumption [H we immediately see that the set of the states accessible 
from any z G i3 is equal to B. This implies that the perturbed Markov process {-P^} is irreducible. 
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We next check aperiodicity of {-P^}. It is clear that any state in diag(^) has period 1. Let us 
next pick any (a°,a^) from the set B \ diag(^). Since a° G Ci{al) holds iff a] E Ct(a.°) 
from Assumption [U the following two paths are both feasible: (a°,a^) — )■ (a^,a°) — )■ (a°,a^), 
(a°,a^) — )■ (a^,a^) — (a^,a°) — )• (a°,a^). This implies that the period of state (a°,a^) is 1 
and the process {P^} is proved to be aperiodic. Hence the process {P^} is both irreducible and 
aperiodic, which means (Al) in Definition [3l 

Appendix B 
Proof of Lemma [5] 

Because of the rule at Step 2 of PHPIP, it is clear that any state belonging to diag(^) cannot 
move to another state without explorations, which implies that all the states in diag(^) itself 
form recurrent communication classes of the unperturbed Markov process 

Let us consider the states in B \ diag(^) and prove that such states are never included in 
the recurrent communication classes of the unperturbed process {-P°}. Here, we use induction. 
We first consider n = 1. If Ui{a\) > f/i(a'|'), then the transition {a'-l,a\) — )■ (a},aj) is taken. 
Otherwise, a sequence of transitions {ai,a\) — )• {a\,a'^) — )• (a1,a'^) occurs. Thus, for n = 1, the 
state (a1,a\) E B\ diag(^) is never included in recurrent communication classes of 

We next make a hypothesis that there exists a G Z+ such that all the states in i3 \ diag(^) 
are not included in recurrent communication classes of the unperturbed Markov process {P°} 
for all n < n' . Then, we consider the case n = n' + 1, where there are three possible cases: 

(i) U,{a^) > Uiia') G V = {1, ■ ■ ■ , ^' + 1}, 

(ii) U,{a^) < U,{a°) G V = {1, ■ • • , n' + 1}, 

(iii) Ui{a^) > Ui{a^) for n" agents where n" G {2, ■ ■ ■ ,n'}. 

In case (i), the transition (a°, a^) — > (a\ a^) must occur for £ = and, in case (ii), the transition 
{a^,a^) — 7- (a\ a°) — )■ (a°,a°) should be selected. Thus, all the states in i3 \ diag(^) satisfying 
(i) or (ii) are never included in recurrent communication classes. 

In case (iii), at the next iteration, all the agents i satisfying Ui{a}) > Ui{a^) choose the current 
action. Then, such agents possess a single action in the memory and, in case of e = 0, each 
agent has to choose either of the actions in the memory. Namely, these agents never change 
their actions in all subsequent iterations. The resulting situation is thus the same as the case of 
n = n' + \ — n" . From the above hypothesis, we can conclude that the states in case (iii) are 
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also not included in recurrent communication classes. In summary, the states in B\diag{A) are 
never included in the recurrent communication classes of {-P°}. The proof is thus completed. 

Appendix C 
Proof of Lemma [6] 

Along with the straight route, the sensor Vi such that a] ^ aj first explores from a] to af, 
whose probability is (1 — £:)"^^£:/(|Cj(a°)| — 1). This implies that the resistance of the transition 
= {a^,a^) — 7- (a^,a^) is equal to 1. 

We next consider the transition from (a^,a^) to z"^ = (a^,a^). If Ui{a'^) > Ui{a^) is true, the 
probability of this transition is (1 — e)", whose resistance is equal to 0. Otherwise, the inequality 
Ui{a^) < Ui{a^) holds and the probability of this transition is equal to (1 — e)" x Ke^\ whose 
resistance is Aj. See Fig. [14] for the graphic description of the above sentences. Let us now 
notice that the resistance x{p) of the straight route p is equal to the sum of the resistances 
of transitions (a^,a^) — )■ (a^,a^) and (a^,a^) — )■ (a^,a^) from (fTSi) . and that Aj < 1/2 from 
Assumption m Hence, we can conclude that x{p) is smaller than 3/2. 

Let us next prove that the above resistance is minimal among all paths from to z"^. Suppose 
now that there is a path p' other than the straight route p such that x(p') < x(p) < 3/2. Then, 
the path can accept only one exploration of one sensor since two explorations lead to resistance 
2. We see from Algorithm [T] that any sensor with ai{k — 1) = ai{k — 2) would not take an 
action other than ai{k — 1) without exploration regardless of the other sensors' actions. Thus, 
the sensor taking exploration has to be Vi such that a} ^ a'^. 

If we denote the chosen action through the exploration by a^, then the available joint action 
in the future is limited to and (a[, a^J since no exploration will be taken. Thus, in order 
that z'^ will be chosen in the future, a- must be equal to a'^, and then either of and can 
occur afterward. Accordingly, the only way to reach z"^ at a round is to follow the transition 
(a^,a^) — )■ (a^,a^), whose resistance is the same as x(r) — 1. This contradicts the assumption 
of x{p') < xip)^ ^rid hence the proof is completed. 

Appendix D 
Proof of Lemma [7] 

As shown in Appendix O the resistance of a straight route must be equal to 1 or 1 + Aj G 
(1,3/2). Suppose now that the route p contains Mp straight routes with resistance greater than 1, 
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and p' contains Mpi such straight routes. Let us also denote by the sensor taking exploration 
along with the straight route z^'^ =^ ^C'+i) in p. Then, the sensor also takes exploration along 
with z*^*^) <^ in p'. We also use the notations 

A,:=[/aaW)-f/aa('+i)), A', := -A,. 

Since z'^'^^ =^ is a straight route and hence only Vi^ changes his action along with the route, 
the following equation holds from Lemma [T] and (fTOl) . 

A, = [/JaW) - t/Ja('+i)) = 0(aW) - 0(a(^+i)) (26) 

From the proof of Lemma |6l the resistance of z^'^ =^ ^('^+1) in p should satisfy 

1, ifUiXa^'-+^^)>UiXa^''^) 
1 + A, G (1,3/2), if t/,^(a(^+i)) < t/i,(aW), 

while the resistance of z'^'^^ in p' is given as 

1 + A: G (1, 3/2), if t/.,(a(^+i)) > t/.,(aW) 
1, if [/i^(a(^+i)) < t/i^(aW). 

Namely, either of the resistances of z^''^ =^ z^"-^^^ and 

^(.+1) ^ is exactly 1 and the other is 
greater than 1 (Fig. [27l) except for the case that f/j(a*^'+^)) = f/j(a*^^^) in which the resistances 
are both equal to 1. Let us now collect all the A^ such that the resistance of z^''^ =^ z^'-'^^^ is 
greater than 1 and number them as Ai, ■ ■ ■ , A^^^. Similarly, we define A'j^, ■ ■ ■ , A'^^ , for the 
reverse route p'. Then, from (|26] ). we obtain 

Ai + • ■ ■ + AAf, - (A; + ■ • ■ + A;,^,) = 0(a^) - 0(a2). (27) 

Note that (l27l) holds even in the presence of pairs {a^''\ a^''^^'>) such that UiXa^'''^^'^) = UiXa^^^)- 
Since Ai + ■ ■ ■ + A^^ = x(p) - (M - 1) and A; + • ■ ■ + A^,^, = x(p') - (Af - 1) from dH, 
we obtain x{p) = x{p') + 4'{ci^) ~ <Pio.^)^ which means the statement of this lemma. 

Appendix E 
Proof of Lemma [8] 

The edges of Gr, denoted by £r, are divided into Eg in (|20l) and £4 := £r\£s. From Lemma 
[6l the weights of the edges in £s are smaller than 3/2. We next consider the weights of an edge 
from z^ = (a\a^) G diag(^) to z"^ = (a^,a^) G diag(^) such that {z^,z'^) G £d- Then, there 
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1 + a; 1 

Case A: UiSa^'+'^) > Ui^'^) Case B: C^;,.(a<'+'^) < C/i„(a('>) 



Fig. 27. Resistance of a straight route (The numbers around arrows describe resistances of paths) 



exist more than 2 sensors such that a} ^ of, or only one sensor Vi such that a\ ^ af satisfies 
af ^ Ci{a}). In both cases, at least two explorations must happen to reach z'^ and hence the 
resistance of any path in Sd has to be greater than 2. Namely, we have 

wu'^ < wi^M^ '^iHi^,Hi,) e £s and {Hi^, Hv^) e S^. (28) 

We next form a graph G"^ = {H, £r,W'j^) by just reversing the weights of all edges over 
graph Gr. Namely, the weight wu^ on Gr is equal to the weight win on G^. Let us now apply 
the Chu-Liu/Edmonds Algorithm [|33l to the graph and compute the minimal tree with a 
root Hr such that there is a unique path from Hr to any node (the directions of edges are 
opposite to the tree defined in Subsection IV- A| ). Then, it is not difficult to confirm that reversing 
the directions of all edges of the minimal tree yields the minimal resistance tree with root Hr 
over Gr. Hence, it is sufficient to prove that the Chu-Liu/Edmonds Algorithm provides a tree 
consisting only of edges in £s- 

In the algorithm [33], every node in T-L \ {Hr} initially chooses the incoming edge with 
the minimal weight. Then, only edges in Eg can be chosen at the initial step from (l28l) . If the 
resulting graph % consisting only of such edges is acyclic, then the minimum spanning tree is 
formed and the statement of this lemma is true. Otherwise, there is at least one cycle in %. 

We next focus on one of such cycles denoted by Gcyc = ("Hcyc, i^cyc, VVcyc), where all edges 
in £cyc have to be contained in Eg. In the following, the weight of the edge in £cyc entering a 
node Hi E Hcyc is denoted by w^yc, and we define Wcyc '■= ^^^HieHcyc ^cyc- Then, each node 
Hi in "Hcyc computes the temporal weights for all edges from Hi/ ^ Hcyc to Hi over G'j^ by 

WW = w'li, - w[ + Wcyc, (29) 
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and identifies a node Hi/ providing the minimal ww, where such a node Hii is denoted by Hi 
and the corresponding wui is denoted by wi. Then, we seek Hi* E Ticyc with the minimal wi 
and replace the edge entering Hi* over % by the edge {Hi*, Hi). 

Chu-Liu/Edmonds Algorithm repeats the above process and eventually finds the minimum 
spanning tree. Namely, if we can prove that (Hi*, Hi) must be included in Eg, the statement of 
the lemma is true. Now, notice that there exists at least one Hi E Ticyc such that the set 

Hi = {Hi, En\ncyc\ {h,,Hi)eSs} 

is not empty from Assumption [T] For such a node Hi, the node Hi must be chosen from Tii since 
(l28l) holds and the second and third terms in (|29] ) are common for all options of Hi/ ^ "Hcyc- 
Then, w'^, in (|29l ) must be smaller than 3/2 since {Hi>, Hi) E Ss, and — w'yc + Wcyc £ (~V2, 0] 
holds since w^.^^ E [1,3/2), Wcyc G [1,3/2), and Wcyc ■= minj^^g^^y^ if'^^. Namely, wi must be 
smaller than 3/2 for all Hi such that T-Li is not empty. In contrast, for any Hi such that T-Li 
is empty, w'n, in (|29l) must be greater than or equal to 2 because of {Hi, Hit) E £d- Then, by 
using — u;'yj, + Wcyc G (—1/2, 0] again, wi is proved to be greater than 3/2 and such Hii is never 
selected as Hi*. Namely, (Hi*, Hi) must be contained in Eg. This completes the proof. 
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